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Fault Tolerance 


P roduction applications need to be protected against 
the possibility of catastrophic failure. Disks fill 
up... hardware fai Is.. .operati ng systems crash... net¬ 
works go down...but with proper foresight these situa¬ 
tions do not have to lead to a loss of objects. This column 
describes mechanisms to achieve fault tolerance and how 
to recover when the bad things happen. 

There are at least two ways in which systems achieve 
fault tolerance. One is to prevent the system from going 
down in the first place; the other is to bring the system 
back to a consistent state if it does go down. The typical 
way to avoid a system from going down isto duplicate, or 
mirror, thestateoftheobject repository on different hard¬ 
ware, so that if the primary piece fails, the system will 
automatically switch over to the duplicate. To bring the 
system back up when it goes down, most transaction- 
based systems employ backup files and transaction logs 
to help the system recover to a consistent state. These 
same approaches apply to Smalltalk applications. 

In multi-user Smalltalk, the object repository is mani¬ 
fested by one or more files (or possibly raw disk parti¬ 
tions) cal led extents. These are where the state of objects 
ultimately reside. For fault tolerance, as well as perfor¬ 
mance reasons, information about objects may first be 
written to other files, called transaction logs. Transaction 
logs contain information to re-do transactions that have 
been committed to the repository. 

When a transaction is committed, all that’s necessary is 
to completely write the transaction log records to consid¬ 
er the transaction complete. The extent files do not have 
to be updated with new or changed objects immediately, 
which can improve overall system performance and 
transaction throughput. 

To avoid a multi-user Smalltalk system from going 
down, the system administrator can specify that the 
extent files are to be replicated. In addition to allocating 
extent files across multiple disk drives on different 
machines for performance and clustering reasons, the 
system administrator can allocate the replicated extents 
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on multiple disk drives as well. While the system is run¬ 
ning, if a client or server process should encounter a read 
error on a primary extent file, the corresponding replicat¬ 
ed extentfile is automatically used instead. 

In GemStone, the system administrator creates repli¬ 
cated extents in two ways. One way is to specify them in 
the configuration file used by the server process at startup 
time. Another way isto dynamically create new replicates 
at runtime by sending the message SystemRepository 
createReplicateOf: extentFilename named: replicateFilename. In 
both cases, you are mappi ng a pri mary extent fi leto a cor- 
responding replicated extent. The replicated extent 
should be located on a different disk spindle to reduce 10 
contention, as well as to providefaulttolerance. 

Even if the object repository is replicated for automat¬ 
ic switchover, it isstill good practice to plan for recovery if 
the system goes down entirely. This planning involves 
deciding how often to back up the system, and how quick¬ 
ly the system must be back online. For 7 x 24 production 
applications, it is imperative that backups be performed 
while the system is online and other users are logged in. 
Since backups may require considerable resources for 
largeobject repositories, it isdesirableto limitthe 10 rate 
of the process performing the backup to reduce its inter¬ 
ference with other sessions. 

To plan for backups and recovery, it is necessary to 
understand how transaction logging works. As mentioned 
earlier, transaction logs contain the information to re-do 
transactions that have been committed. Transaction logs 
are used to recover from an unexpected shutdown or to 
roll forward from a backup file. When configuring a sys¬ 
tem, an administrator supplies multiple locations where 
transaction logs are to be written. Therefore, if one disk 
becomes full, the system can automatically switch over to 
the next location. It isalso possibleto configure the max¬ 
imum size of each transaction log file to balance the uti¬ 
lization of the disk resources. Transaction logs can be 
replicated to provide the same benefits as replicated 
extents 

Recall that objects may not be written immediately to 
extent files. To force the information in transaction logs to 
be written to the extent files, an administrator performs a 
checkpoint. Performing a checkpoint reduces the number 
of transaction logs that have to be applied when the sys- 
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tem recovers from a crash, where the extent files are not 
damaged. 

Transaction logging can be set up to handletwo kinds 
of recovery situations. In thefirst situation, the system has 
unexpectedly shut down, but the extent files are not cor¬ 
rupt. To recover the object repository to the last commit¬ 
ted state, only transaction log records that were written 
si nee the last checkpoint are applied. This mode of trans¬ 
action logging is called partial logging, since not all trans¬ 
action logs are needed to recover. To free up space, an 
administrator can remove any log files written prior to the 
most recent checkpoint, usually leaving the current log 
and theone immediately before. 

In partial logging mode, the frequency of performing a 
checkpoint helps control how long it takes to recover the 
system. In GemStone, the system can be set up to auto¬ 
matically perform checkpoi nts at specific intervals by set¬ 
ting a configuration parameter; or, a checkpoint can be 
performed explicitly by sending 
System checkpoint. When the system 
is in partial logging mode, a check¬ 
point is also triggered when any 
transaction writes a log record 
whose size is greater than some 
configurable threshold. 

Thesecond kind of recovery situ¬ 
ation occurs when the system 
crashes and the extent files are cor¬ 
rupt. I n this case, the object reposi¬ 
tory must be recovered from back¬ 
up files. To recover from this situa¬ 
tion, all transaction logs that were written since the back¬ 
up are needed. This type of recovery is supported by con¬ 
figuring the system to be in full logging mode. Full trans¬ 
action logging should be used for production applica¬ 
tions, to guarantee recoverability in the face of media fail¬ 
ure. 

One factor determining the time to recover from a 
backup is the frequency of backups performed. To per¬ 
form a backup of the object repository in GemStone, a 
user performs the message: SystemRepository fullBackupTo: 
aFiIeOrDevice Mbytes: aByteLi mit. Thefi rst argument specifies 
the file, raw partition, or device where the backup isto be 
created. The second argument specifies a byte limit so 
that you can create multiple backup files by limiting the 
size of each part. 

When the first backup file is finally written, you con- 
ti nue writing the next part of the backup with the message 
SystemRepository continueFullBackupTo: aFileOrDevice Mbytes: 
aByteLi mit. Si nee the backup procedure may consume sys¬ 
tem resources, auser can control thelO rate of the cur¬ 
rent backup session by sending System configurationAt: 
#Geml OLi mit put: 10. This example allows a maximum of 10 
I Os per second. 

To restoretheobject repository, a system administrator 
first starts a server process on a new object repository. 
Then the restore operation is performed by sending 
SystemRepository restoreFromBackup: backupFilename. At this 


poi nt, the state of the repository is the same as when the 
backup file was created. Now the administrator can apply 
transaction logs to roll forward from the state of the back¬ 
up to the state of the last committed transaction. 

To find out the first transaction log file needed, the 
administrator sends SystemRepository restoreStatus to get 
thefileid ofthe log file. When transaction log files are cre¬ 
ated, they are given a filename that includes an increas¬ 
ing numerical file id so that the sequence of file creation 
is evident. This helps in determining which transaction 
log files to archive (i.e. move somewhere else), and which 
are needed for restoration. If the needed transaction log 
files have been archived, the administrator sends 
SystemRepository restoreFromLog: aTranLogFilename to explic¬ 
itly specify their location. If the remaining log files are 
located in their original location, then the administrator 
performs SystemRepository restoreFromCurrentLogs. The 
administrator sends the message SystemRepository 
commitRestore to finish the restora¬ 
tion and allow other users to login. 
It is also possible to restore to a 
specific point in time, by sending 
SystemRepository timeToRestoreTo: 
aDateTime, before restoring from 
transaction logs. 

Using transaction logs, a 'warm' 
backup system can be built with 
the mechanisms described above. 
A 'warm' backup system is a dupli¬ 
cated object repository not kept in 
sync with the pri mary repository i n 
real time by the underlying system. Instead, the duplicat¬ 
ed object repository is explicitly synchronized with the 
primary repository at specific time intervals. The advan¬ 
tage of a warm backup is that it places no burden on the 
primary system to perform 10 to multiple locations; the 
disadvantage is that the warm backup is only up-to-date 
based on the last time it was explicitly synchronized with 
the pri mary system. 

To build a warm backup system, a server process is 
started up on a copy of the primary object repository (or 
it could be started up on a new repository, then restored 
from a backup file of the primary repository). This is the 
warm backup server. Next, a process is spawned that con- 
tinually looks for new transaction logs being created by 
the pri mary server. 

When a new transaction logfileiscreated,thisprocess 
can copy the previous log file to the backup site and per¬ 
form SystemRepository restoreFromLog: aTranLogFilename. If 
the primary repository goes down, the warm backup site 
performs SystemRepository commitRestore, and itisreadyfor 
duty. 

Fault tolerance is a necessary consideration for pro¬ 
duction applications. System administrators need to plan 
for disaster and have the mechanisms in place to recover. 
Duplicated object repositories and transaction logging 
are two mechanisms that provide the functionality need¬ 
ed for 7x24 applications. S 


"One factor determining the 
ti me to recover from a 
backup is the frequency 
of backups performed .." 
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